Average complexity of exact and approximate multiple string matching
نویسندگان
چکیده
We show that the average number of characters examined to search for r random patterns of length m in a text of length n over a uniformly distributed alphabet of size σ cannot be less than Ω(n log σ (rm)/m). When we permit up to k insertions, deletions, and/or substitutions of characters in the occurrences of the patterns, the lower bound becomes Ω(n(k+ log σ (rm))/m). This generalizes previous single-pattern lower bounds of Yao (for exact matching) and of Chang and Marr (for approximate matching), and proves the optimality of several existing multipattern search algorithms.
منابع مشابه
Average-Optimal Multiple Approximate String Matching
We present a new algorithm for multiple approximate string matching, based on an extension of the optimal (on average) singlepattern approximate string matching algorithm of Chang and Marr. Our algorithm inherits the optimality and is also competitive in practice. We present a second algorithm that is linear time and handles higher difference ratios. We show experimentally that our algorithms a...
متن کاملAverage-optimal string matching
The exact string matching problem is to find the occurrences of a pattern of length m from a text of length n symbols. We develop a novel and unorthodox filtering technique for this problem. Our method is based on transforming the problem into multiple matching of carefully chosen pattern subsequences. While this is seemingly more difficult than the original problem, we show that the idea leads...
متن کاملOn-Line Approximate String Matching with Bounded Errors
We introduce a new dimension to the widely studied on-line approximate string matching problem, by introducing an error threshold parameter so that the algorithm is allowed to miss occurrences with probability . This is particularly appropriate for this problem, as approximate searching is used to model many cases where exact answers are not mandatory. We show that the relaxed version of the pr...
متن کاملImproved Two-Way Bit-parallel Search
New bit-parallel algorithms for exact and approximate string matching are introduced. TSO is a two-way Shift-Or algorithm, TSA is a two-way Shift-And algorithm, and TSAdd is a two-way Shift-Add algorithm. Tuned Shift-Add is a minimalist improvement to the original Shift-Add algorithm. TSO and TSA are for exact string matching, while TSAdd and tuned Shift-Add are for approximate string matching ...
متن کاملApproximate Boyer-Moore String Matching
The Boyer-Moore idea applied in exact string matching is generalized to approximate string matching. Two versions of the problem are considered. The k mismatches problem is to find all approximate occurrences of a pattern string (length m) in a text string (length n) with at most k mismatches. Our generalized Boyer-Moore algorithm is shown (under a mild independence assumption) to solve the pro...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Theor. Comput. Sci.
دوره 321 شماره
صفحات -
تاریخ انتشار 2004